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(57) Abstract 

A subscriber characterization and advertisement monitoring system (100) is presented in which subscriber viewing habits are 
monitored to determine demographic profiles. These profiles can be utilized for the matching of advertisements to subscribers 
based on their viewing habits and estimated demographics and product interests. The system (100) can be run locally ^ ir l a 
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(57) Abrege 

L'invention concerne un systeme de determination de profil d'abonne et de surveillance publicitaire (100), dans lequel on 
controle les publicites qu'un abonne regarde, afin de determiner des profils demographiques. On peut utiliser ces profits pour 
faire corresponds les publicites destinees a des abonnes en fonction de ce qu'ils regardent, revaluation des donnees 
demographiques et I'interet presente par un produit Le systeme (100) peut etre mis en application localement dans un appareil 
de television (1808) ou peut etre utilise en mode serveur-client, les selections de chaTnes etant transmises de la residence 
(1800) vers ['emplacement de commutation (1840) centralist (serveur), tel qu'un central telephonique ou un fournisseur de 
services Internet En mode serveur-client, les selections de chaines sont controlees au niveau de ('emplacement de commutation 
(1840) centralise, ce mode determinant egalement le profil d'abonne. Le systeme permet egalement de controler si les publicites 
sont regardees et pendant combien de temps. 
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TITLE 

Subscriber characterization and 
advertisement monitoring system 

Background of the Invention 
Cable television service providers have typically 
provided one-way broadcast services but now offer high-speed 
data services and can combine traditional analog broadcasts 
with digital broadcasts and access to Internet web sites. 
Telephone companies can offer digital data and video 
programming on a switched basis over digital subscriber line 
technology. Although the subscriber may only be presented 
with one channel at a time, channel change requests are 
instantaneously transmitted to centralized switching 
equipment and the subscriber can access the programming in a 
broadcast -like manner. Internet Service Providers (ISPs) 
offer Internet access and can offer access to text, audio, 
and video programming which can also be delivered in a 
broadcast-like manner in which the subscriber selects 
"channels" containing programming of interest. Such 
channels may be offered as part of a video programming 
service or within a data service and can be presented within 
an Internet browser. 

Advertisements are a part of daily life and certainly 
an important part of entertainment programming, where the 
payments for advertisements cover the cost of network 
television. A method, which provides a flexible billing plan 
to cable network users based on the amount of advertisements 
viewed is described in U.S. Patent No. 5,532,735, which 
discloses a method of advertisement selection for 
interactive services. A user associated with an interactive 
TV is presented with a program and a set of advertisements. 
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The user can indicate the amount of advertisements in the 
set of advertisements he wants to view. 

While advertisements are sometimes beneficial to 
subscribers and deliver desired information regarding 
specific products or services, consumers generally view 
advertising as a "necessary evil" for broadcast- type 
entertainment. For example, a method for obtaining 
information on advertised services or products is described 
in U.S. Patent No. 5,708,478, which discloses a computer 
system for enabling radio listeners and television watchers 
to obtain advertising information. The system includes steps 
of determining whether an incoming video or audio signal 
includes advertisement specific data of an advertiser and 
capturing and storing the advertiser specific data. 
15 Manufacturers pay an extremely high price to present, 

in 30 seconds or less, an advertisement for their product, 
which they hope a consumer will watch. Unfortunately for the 
manufacturer, the consumer frequently uses that interval of 
time to check the programming being presented on the other 
30 20 channels, and may not watch any of the advertisement. 

Alternately, the consumer may mute the channel and ignore 
what the manufacturer has presented. In any case the 
probability that the consumer has watched the advertisement 
is quite low. It is not until millions of dollars have been 
25 spent on an advertising campaign that a manufacturer can 
determine that the ads have been effective. This is 
presently accomplished by monitoring sales of the product or 
TV programs or channels viewed by users as disclosed in 
various public documents. As an example, U.S. Patent No. 
30 4,546,382 discloses a television and market research data 
45 collection system and method. A data collection unit 

containing a memory stores data as to which of the plurality 
of TV modes are in use, which TV channel is being viewed as 
well as input from a suitable optical scanning device for 
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collecting information about user's product purchases. 
Another system described in U.S. Patent No. 4,258,386 
discloses a television audience measuring system. The system 
monitors and stores information representative of channel 
identification, the time at which the channel is selected 
and the time at which the selection of a channel is 
terminated. U.S. Patent No. 5,608,445 discloses also a 
method and device for data capture in television viewer 
research. Devices are attached to a video installation in 
order to determine to which channel a set is tuned. 

With the advent of the Internet manufacturers and 
service providers have found ways to selectively insert 
their advertisements based on a subscribers requests for 
information. As an example, an individual who searches for 
15 "cars" on the Internet may see an advertisement for a 
particular type of car. Various internet-based advertising 
use this method. The product literature from IMGIS Inc.," Ad 
Force," printed from the World Wide Web site 
http://www.starpt.com/core/ad_Target.html on June 30, 1998 
discloses an ad targeting system. The system delivers ads to 
web sites visitors based on the content of the web page, 
time of day, day of the week, keyword, by the number of 
times a visitor sees an advertisement and by the order in 
which a series of advertisements are shown to a visitor. 
25 Nevertheless, unless the subscriber actually goes to the 
advertised web site, there is no way to determine if the 
advertisement has been watched. As the content on the 
Internet migrates to multimedia programming including audio 
and video, the costs for the advertising will increase, but 
unless the advertiser can be sure that a significant 
percentage of the message was watched or observed, the 
advertising is ineffective. Prior art products for 
generating reports of ad campaign are generally PC-centric 
as described in various product literature which include the 
50 3 
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product literature from Doubleclick Inc., "Doubleclick: 
Reporting, " printed from the World Wide Web (WWW) site 
ht tp : //www . doubled ick . net /dart /howi_repo . htm on June 19 , 
1998, which discloses the reporting capabilities of 
5 Doubleclick's Dynamic Advertising Reporting & Targeting 
(DART) product. The information in the reports includes 
daily impressions by advertisement type, average impression 
per day of week and by hour of day. The average response 
15 rate per user is also included in the reports. The product 

10 literature from Netgravity Inc. "AdServer 3," printed from 
the World Wide Web site http://www.netgravity.com/products/ 
on July 9, 1998 discloses Netgravity' s Adserver 3 product 
for online advertisement. The product generates reports 
including the profiles of visitors who viewed an ad and site 
15 traffic throughout the day, week, month and year. 

The product literature from Media Metrix "Frequently 
Asked Questions", printed from the World Wide Web site 
http : / /www . mediamet r ix . com/ interact_mmf aq . htm on June 3 0 , 
1998 discloses Media Metrix software, PC Meter, that runs in 
30 20 the background of a PC and monitors everything being done on 

that machine. It determines who is using the PC by age, 
income, gender and geographic region and tracks usage of 
software application, commercial online services and 
detailed page level viewing of the World Wide Web. The 
25 marketing literature from Matchlogic Inc., "Centralized Ad 
Management," printed from the World Wide Web site 
http://www.matchlogic.com/docs/services2.htm on July 1, 1998 
discloses Matchlogic services for ad management. The 
services include delivering advertisements * based on pre- 
30 defined targeting criteria, generating reports on how many 
unique viewers saw which banner and how many times it was 
viewed. The product literature from Accipiter Inc.," 
Accipiter AdManager 2.0," printed from the World Wide Web 
site http : / /www . accipiter . com/product s/ADManager/ fab. html on 
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July 9, 1998 discloses Accipiter's ad management system. 
After delivering an advertisement based on pre-defined 
criteria, the system can generate reports on an ad campaign. 
The reports include visitors' demographic data, number of 
5 impressions and clicks generated from the entire site and by 
each ad and advertiser. 

In order to deliver more targeted programming and 
advertising to subscribers, it is necessary to understand 
their likes and dislikes to a greater extent than is 
10 presently done today. Systems which identify subscriber 
preferences based on their purchases and responses to 
questionnaires allow for the targeted marketing of 
literature in the mail, but do not in *any sense allow for 
the rapid and precise delivery of programming and 
15 advertising which is known to have a high probability of 
acceptance to the subscriber. Other systems give users the 
possibility to chose their programming as described in U.S. 
Patent No. 5,223,924 which discloses a system and method for 
automatically correlating user preferences with a TV program 
30 20 information database. The system includes a processor that 

performs "free text" search techniques to correlate the 
downloaded TV program information with the viewer's 
preferences. This system requires an interaction between the 
users and the programming. The white paper from Net 
25 Perceptions corporation entitled "Adding Value in the 
Digital Age" and printed from the World Wide Web site 
http : / /www . netperceptions . com/products /white -papers . html on 
June 30, 1998 discloses how the GroupLens Recommendation 
Engine gives online businesses the ability to target and 
30 personalize services, content, products and advertising, A 
learning process learns personal information about an 
individual using explicit and implicit ratings, a prediction 
process predicts user preference using collaborative 
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filtering and the recommendation process recommends products 
or services to users based on predictions. 

The product literature from Aptex software Inc., 
"SelectCast for Commerce Servers," printed from the World 
5 Wide Web site http://www.aptex.com/products-selectcast- 
commerce.htm on June 30, 1998 describes the product 
SelectCast for Commerce Servers. It personalizes online 
shopping based on observed user behavior. User interests are 
learned based on the. content they browse, the promotions 
10 they click and the products they purchase. 

In order to determine which programming or advertising 
is appropriate for the subscriber, knowledge of that 
20 subscriber and the subscriber product and programming 

preferences is required. Different methods are being used to 
15 gain knowledge of user's preferences and to profile the 
users. Generally, these methods use content or data mining 
technologies to profile users or predict their preferences. 
Another technique for predicting user's preferences is based 
on the use of collaborative* filtering as described in U.S. 
30 20 Patent No. 5,704,017 which discloses a collaborative 

filtering system utilizing a belief network. The system 
learns a belief network using prior knowledge obtained from 
an expert in a given field of decision making and a database 
containing empirical data such as users' attributes as well 
25 as their preferences in that decision making field. The 
belief network can determine the probability of the unknown 
preferences of the user given the known attributes and thus 
predicts the preference Tnost likely to be desired by the 
user. 

30 The product literature from Aptex software Inc., 

"SelectCast for Ad Servers," printed from the World Wide Web 
site http://www.aptex.com/products-selectcast-ads.htm on 
June 30, 1998 discloses an ad targeting system from Aptex 
Software Inc. The system employs neural networks and a 
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context vector data model to optimize relationships between 
users and content. It provides user profiling by mining the 
context and content of all actions including clicks, 
queries, page views and ad impressions. Aptex' s technology 
uses a context vector data modeling technique described in 
U.S. Patent No. 5,619,709 which discloses a system and 
method of context vector generation and retrieval. Context 
vectors represent conceptual relationships among information 
items by quantitative means. A neural network operates on a 
training corpus of records to develop relationship-based 
context vectors based on word proximity and co- importance . 
Geometric relationships among context vectors are 
representative of conceptual relationships among their 
associated items. 
15 The product Data sheet from Open Sesame, "Learn 

Sesame," printed from the World Wide Web. site 
http://www.opensesame.com/prod_04.html on July 09, 1998 
discloses Open Sesame's personalization product for Web 
enterprises. It learns about users automatically from their 
30 20 browsing behavior. 

The product literature from Engage Technologies, 
"Engage. Discover, " printed from the World Wide Web site 
http://www.engagetech.com on July 09, 1998 discloses Engage 
Technologies' product for user profiling. User-disclosed 
25 information such as interest, demographics and opinions are 
combined with anonymous clickstream data that describes 
where users come from before visiting the site, how long 
they stay, and what pages or types of pages they visit most 
frequently to build the visitor profile. 

The marketing literature from Broadvision, "The Power 
of Personalization", printed from the World Wide Web site 
http : / /www. broadvision . com/content /corporate/brochure/Broch4 
.htm on August 21, 1998 discloses Broadvision One- to-One 
application profiling system. The system learns about users 
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through a variety of techniques including registration, 
questionnaires, observation and integration of historical 
and externally generated data. 

The marketing literature from Firefly Corporation, 
"Firefly passport Office, *' printed from the World Wide Web 
site http : / /www . firefly . net/company/PassportOf f ice . html on 
June 20, 1998 discloses Firefly's Relationship Management 
software. The software enables online businesses to create, 
extend and manage personal profiles for every user. 

Specific information regarding a subscriber's viewing 
habits or the Internet web sites they have accessed can be 
stored for analysis, but such records are considered private 
and subscribers are not generally willing to have such 
information leave their control. Although there are 
regulatory models, which permit the collection of such data 
on a "notice and consent" basis, there is a general tendency 
towards legal rules, which prohibit such raw data to be 
collected . 

With the migration of services from a broadcast based 
model to a client-server based model in which subscribers 
make individualized request for programming to an Internet 
access provider or content provider, there is opportunity to 
monitor the subscriber viewing characteristics to better 
provide them with programming and advertising which will be 
of interest to them. A server may act as a proxy for the 
subscriber requests and thus be able to monitor what a 
subscriber has requested and is viewing. Since subscribers 
may not want this raw data to be utilized, there is a need 
for a system which can process this information and generate 
statistically relevant subscriber profiles. These profiles 
should be accessible to others on the network who may wish 
to determine if their programming or advertisements are 
suitable for the subscriber. In a broadcast -based model, the 
information to be processed can be embedded within the TV 

8 



10 



15 



WO 00/33.60 PCTAJS9W28528 

program or broadcast separately and can be in form of an 
electronic program guide (EPG) or text information related 
to the program. As an example, U.S. Patent No. 5,579,055 
discloses an electronic program guide (EPG) and text channel 
5 data controller. The text and EPG data are embedded in the 
vertical blanking interval of the video signal and 
extracted, at reception, by the data controller. The EPG 
contains information fields such as program category, 
program subcategory and program content description. U.S. 
10 Patent No. 5,596,373 discloses also a method and apparatus 
for providing program oriented information in a multiple 
station broadcasting system. The EPG data includes guide 
data, channel data and program data. The program data 
includes among other information, the program title, the 
15 program category, the program sub-category and . a detailed 
description of the program. 

For the foregoing reasons, there is a need for an 
advertisement monitoring system which can monitor which 
advertisements have been viewed by a subscriber. There is 
30 20 also a need for a subscriber characterization system which 

can generate and store subscriber characteristics which 
reflect the probable demographics and preferences of the 
subscriber and household. 
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Summary Of The Invention 

40 The present invention encompasses a system for 

determining to what extent an advertisement has been viewed 
by a subscriber or household. 
30 m a preferred embodiment subscriber selection data 

including the channel selected and the time at which it was 
selected are recorded. Advertisement related information 
including the type of product, brand name, and other 
descriptive information which categorizes the advertisement 
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is extracted from the advertisement or text information 
related to the advertisement including closed captioning 
text. Based on the subscriber selection data a record of 
what percentage of the advertisement was watched is created. 
This record can subsequently be used to make a measure of 
the effectiveness of the advertisement. 

In a preferred embodiment the text information related 
to the advertisement is processed using context mining 
techniques which allow for classification of the 
advertisement and extraction of key data including product 
type and brand. Context mining techniques allow for 
determination of a product type, product brand name and in 
the case of a product which is not sold with a particular 
brand name, a generic name for the product. 

The present invention can also be realized in a client- 
server mode in which case the subscriber executes channel 
changes at the client side of the network which are 
transmitted to the server side and fulfilled by the routing 
of a channel to the subscriber. The server side monitors 
the subscriber activity and stores the record of channel 
change requests. Advertisement related information is 
retrieved from the server side, which contains the 
advertising material itself, retrieves the advertising 
material from a third party, or analyzes the data stream 
carrying the advertising to the subscriber. The server side 
extracts descriptive fields from the advertisement and based 
on the subscriber selection data, determines the extent to 
which the advertisement was viewed by the subscriber. As an 
example the system can determine the percentage of the 
advertisement that was viewed by the subscriber. 

The present invention includes a system for 
characterizing subscribers watching video or multimedia 
programming based on monitoring their detailed selection 
choices including the time duration of their viewing, the 
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volume the programming is listened at, the program 
selection, and collecting text information about that 
programming to determine what type of programming the 
subscriber is most interested in. In addition, the system 
5 can generate a demographic description of the subscriber or 
household which describes the probable age, income, gender 
and other demographics. The resulting characterization 
includes probabilistic determinations of what other 
programming or products the subscriber/household will be 
10 interested in. 

In a preferred embodiment, the textual information 
which describes the programming is obtained by context 
20 mining of text associated with the programming. The 

associated text can be from the closed-captioning data 
15 associated with the programming, an electronic program 
25 guide, or from text files associated with or part of the 

programming itself . 

The system can provide both session measurements which 
correspond to a profile obtained over a viewing session, or 
30 20 an average profile which corresponds to data obtained over 

multiple viewing sessions. 

The present invention also encompasses the use of 
heuristic rules in logical form or expressed as conditional 
probabilities to aid in forming a subscriber profile. The 
25 heuristic rules in logical form allow the system to apply 
generalizations which have been learned from external 
studies to obtain a characterization of the subscriber. In 
the case of conditional probabilities, determinations of the 
probable content of a program can be applied in a 
30 mathematical step to a matrix of conditional probabilities 
to obtain probabilistic subscriber profiles indicating 
program and product likes and dislikes as well for 
determining probabilistic demographic data. 

One advantage of the present invention is that it 
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allows consumers the possibility to permit access to 
probabilistic information regarding their household 
demographics and programming/product preferences, without 
reV ealing their specific viewing history. Subscribers may 
elect to permit access to this information in order to 
receive advertising which is more targeted to their 
likes/dislikes'. Similarly, a subscriber may wish to sell 
access to this statistical data in order to receive revenue 
or receive a discount on a product or a service. 

Another advantage of the present invention is that the 
resulting probabilistic information can be stored locally 
and controlled by the subscriber, or can be transferred to a 
third party which can provide access to the subscriber 
characterization. The information can also be encrypted to 
prevent unauthorized access in which case only the 
subscriber or someone authorized by the subscriber can 
access the data. 

The present invention includes also a system for 
characterizing subscribers watching video or multimedia 
programming based on monitoring the requests made by the 
subscriber for programming to a server which contains the 
content or which requests the content from a third party. 
The server side of the network is able to monitor the 
subscriber's detailed selection choices including the time 
duration of their viewing, the volume the programming is 
listened at, and the program selection. 

The server side collects text information about that 
programming to determine what type of programming the 
subscriber is most interested in. In addition the system 
can generate a demographic description of the subscriber or 
household which describes the probable age, income, gender 
and other demographics. The resulting characterization 
includes probabilistic determinations of what other 
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programming or products the subscriber/ household will be 
interested in. 

These and other features and objects of the invention 
will be more fully understood from the following detailed 
description of the preferred embodiments which should be 
read in light of the accompanying drawings. 

Brief Description of the Drawings 

The accompanying drawings, which are incorporated in 
and form a part of the specification, illustrate the 
embodiments of the present invention and, together with the 
description serve to explain the principles of the 
invention. 

In the drawings : 

FIG. 1 shows a context diagram for a subscriber 
characterization system. 

FIG. 2 illustrates a block diagram for a realization of 
a subscriber monitoring system for receiving video signals; 

FIG. 3 illustrates a block diagram of a channel 
processor; 

FIG. 4 illustrates a block diagram of a computer for a 
realization of the subscriber monitoring system; 

FIG. 5 illustrates a channel sequence and volume over a 
twenty-four (24) hour period; 

FIG. 6 illustrates a time of day detailed record; 

FIG. 7 illustrates a household viewing habits 
statistical table; 

FIG. 8A illustrates an entity-relationship diagram for 
the generation of program characteristics vectors ; 

FIG. 8B illustrates a flowchart for program 
charact eri zat ion ; 

FIGS. 9A illustrates a deterministic program category 

vector; 
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FIG. 9B illustrates a deterministic program sub- 
category vector; 

FIG. 9C illustrates a deterministic program rating 
vector; 

FIG. 9D illustrates a probabilistic program category 
vector ; 

FIG. 9E illustrates a probabilistic program sub- 
category vector ; 

FIG. 9F illustrates a probabilistic program content 
vector ; 

FIG. 10A illustrates a set of logical heuristic rules; 

FIG. 10B illustrates a set of heuristic rules expressed 
in terms of conditional probabilities; 

FIG. 11 illustrates an entity- relationship diagram for 
the generation of program demographic vectors; 

FIG. 12 illustrates a program demographic vector; 

FIG. 13 illustrates an entity-relationship diagram for 
the generation of household session demographic data and 
household session interest profiles; 

FIG. 14 illustrates an entity-relationship diagram for 
the generation of average and session household demographic 
characteristics ; 

FIG. 15 illustrates average and session household 
demographic data; 

FIG. 16 illustrates an entity- relationship diagram for 
generation of a household interest profile; 

FIG. 17 illustrates household interest profile 
including programming and product profiles; 

FIG. 18 illustrates a client-server architecture for 
realizing the present invention; and 

FIG. 19 illustrates an advertisement monitoring table. 
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Detailed Description 
Of The Preferred Embodiment 
In describing a preferred embodiment of the invention 
illustrated in the drawings, specific terminology will be 
5 used for the sake of clarity. However, the invention is not 
intended to be limited to the specific terms so selected, 
and it is to be understood that each specific term includes 
all technical equivalents which operate in a similar manner 
15 to accomplish a similar purpose. 

10 With reference to the drawings, in general, and FIGS. 1 

through 19 in particular, the apparatus of the present 
invention is disclosed. 

The present invention is directed at an apparatus for 
monitoring which advertisements are watched by a subscriber 
15 or a household. 

In the present system the programming viewed by 
the subscriber, both entertainment and advertisement, can be 
studied and processed by the subscriber characterization 
system to determine the program characteristics. This 
30 20 determination of the program characteristics is referred to 

as a program characteristics vector. The vector may be a 
truly one -dimensional vector, but can also be represented as 
an n dimensional matrix which can be decomposed into 
vectors. For advertisements, the program characteristics 
25 vector can contain information regarding the advertisement 
including product type, features, brand or generic name, or 
other relevant advertising information. 

The subscriber profile vector represents a profile of 
the subscriber (or the household of subscribers) and can be 
30 in the form of a demographic profile (average or session) or 
45 a program or product preference vector. The program and 

product preference vectors are considered to be part of a 
household interest profile which can be thought of as an n 
dimensional matrix representing probabilistic measurements 

15 



35 



40 



50 



55 



WO 00/33160 PCT/US99/28S28 

of subscriber interests. 

In the case that the subscriber profile vector is a 
demographic profile, the subscriber profile vector indicates 
a probabilistic measure of the age of the subscriber or 
average age of the viewers in the household, sex of the 
subscriber, income range of the subscriber or household, and 
other such demographic data. Such information comprises 
household demographic characteristics and is composed of 
both average and session values. Extracting a single set of 
values from the household demographic characteristics can 
correspond to a subscriber profile vector. 

The household interest profile can contain both 
programming and product profiles, with programming profiles 
corresponding to probabilistic determinations of what 
15 programming the subscriber (household) is likely to be 
interested in, and product profiles corresponding to what 
products the subscriber (household) is likely to be 
interested in. These profiles contain both an average value 
and a session value, the average value being a time average 
of data, where the averaging period may be several days, 
weeks, months, or the time between resets of unit. 

Since a viewing session is likely to be dominated by a 
particular viewer, the session values may, in some 
circumstances, correspond most closely to the subscriber 
25 values, while the average values may, in some circumstances, 
correspond most closely to the household values. 

FIG. 1 depicts the context diagram of a preferred 
embodiment of a Subscriber Characterization System (SCS) 
100. A context diagram, in combination with entity- 
30 relationship diagrams, provide a basis from which one 
45 skilled in the art can realize the present invention. The 

present invention can be realized in a number of programming 
languages including C, C++, Perl, and Java, although the 
scope of the invention is not limited by the choice of a 
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particular programming language or tool. Object oriented 
languages have several advantages in terms of construction 
of the software used to realize the present invention, 
although the present invention can be realized in procedural 
5 or other types of programming languages known to those 
skilled in the art. 

In generating a subscriber profile, the SCS 100 
receives from a user 120 commands in the form of a volume 
15 control signal 124 or program selection data 122 which can 

10 be in the form of a channel change but may also be an 
address request which requests the delivery of programming 
from a network address. A record signal 126 indicates that 
the programming or the address of the programming is being 
recorded by the user. The record signal 126 can also be a 
15 printing command, a tape recording command, a bookmark 
command or any other command intended to store the program 
being viewed, or program address, for later use. 

The material being viewed by the user 120 is referred 
to as source material 130. The source material 13 0, as 
30 2 o defined herein, is the content that a subscriber selects and 

may consist of analog video, Motion Picture Expert Group 
(MPEG) digital video source material, other digital or 
analog material, Hypertext Markup Language (HTML) or other 
type of multimedia source material. The subscriber 
25 characterization system 100 can access the source material 
130 received by the user 120 using a start signal 132 and a 
stop signal 134, which control the transfer of source 
related text 136 which can be analyzed as described herein. 

In a preferred embodiment, the source related text 136 
30 can be extracted from the source material 130 and stored in 
45 memory. The source related text 136, as defined herein, 

includes source related textual information including 
descriptive fields which are related to the source material 
130, or text which is part of the source material 130 
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itself. The source related text 136 can be derived from a 
number of sources including but not limited to closed 
captioning information, Electronic Program Guide (EPG) 
material, and text information in the source itself (e.g. 
5 text in HTML files) . 

Electronic Program Guide (EPG) 14 0 contains information 
related to the source material 130 which is useful to the 
user 120. The EPG 140 is typically a navigational tool 
which contains source related information including but not 
10 limited to the programming category, program description, 
rating, actors, and duration. The structure and content of 
EPG data is described in detail in US Patent 5,596,373 
assigned to Sony Corporation and Sony Electronics which is 
herein incorporated by reference. As shown in FIG. 1, the 
15 EPG 140 can be accessed by the SCS 100 by a request EPG data 
signal 142 which results in the return of a category 144, a 
sub-category 146, and a program description 148. EPG 
information can potentially include fields related to 
advertising . 

20 In one embodiment of the present invention, EPG data is 

accessed and program information such as the category 144, 
the sub-category 146, and the program description 148 are 
stored in memory. 

In another embodiment of the present invention, the 

25 source related text 136 is the closed captioning text 
embedded in the analog or digital video signal. Such closed 
captioning text can be stored in memory for processing to 
extract the program characteristic vectors 150. 

One of the functions of the SCS 100 is to generate the 

30 program characteristics vectors 150 which are comprised of 
program characteristics data 152, as illustrated in FIG. 1. 
The program characteristics data 152, which can be used to 
create the program characteristics vectors 150 both in 
vector and table form, are examples of source related 
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information which represent characteristics of the source 
material. In a preferred embodiment, the program 
characteristics vectors 150 are lists of values which 
characterize the programming (source) material in according 
5 to the category 144, the sub-category 146, and the program 
description 148. The present invention may also be applied 
to advertisements, in which case program characteristics 
vectors contain, as an example, a product category, a 
product sub- category , and a brand name. 
10 As illustrated in FIG. l, the SCS 100 uses heuristic 

rules 160. The heuristic rules 160, as . described herein, 
are composed of both logical heuristic rules as well as 
20 heuristic rules expressed in terms of conditional 

probabilities. The heuristic rules 160 can be accessed by 
15 the SCS 100 via a request rules signal 162 which results in 
the transfer of a copy of rules 164 to the SCS 100. 

The SCS 100 forms program demographic vectors 170 from 
program demographics 172, as illustrated in FIG. 1. The 
program demographic vectors 170 also represent 
30 20 characteristics of source related information in the form of 

the intended or expected demographics of the audience for 
which the source material is intended. 

Subscriber selection data 110 is obtained from the 
35 monitored activities of the user and in a preferred 

25 embodiment can be stored in a dedicated memory. In an 
alternate embodiment, the subscriber selection data 110 is 
4Q stored in a storage disk. Information which is utilized to 

form the subscriber selection data 110 includes time 112, 
which corresponds to the time of an event, channel ID 114, 
30 program ID 116, volume level 118, channel change record 119, 
45 anc j program title 117. A detailed record of selection data 

is illustrated in FIG. 6. 

In a preferred embodiment, a household viewing habits 
195 illustrated in FIG . 1 is computed from the subscriber 
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selection data 110. The SCS 100 transfers household viewing 
data 197 to form household viewing habits 195. The 
household viewing data 197 is derived from the subscriber 
selection data 110 by looking at viewing habits at a 
particular time of day over an extended period of time, 
usually several days or weeks, and making some 
generalizations regarding the viewing habits during that 
time period. 

The program characteristics vector 150 is derived from 
the source related text 136 and/or from the EPG 140 by 
applying information retrieval techniques. The details of 
this process are discussed in accordance with FIG. 8. 

The program characteristics vector 150 is used in 
combination with a set of the heuristic rules 160 to define 
a set of the program demographic vectors 170 illustrated in 
FIG. 1 describing the audience the program is intended for. 

One output of the SCS 100 is a household profile 
including household demographic characteristics 190 and a 
household interest profile 180. The household demographic 
characteristics 190 resulting from the transfer of household 
demographic data 192, and the household interest profile 
180, resulting from the transfer of household interests data 
182. Both the household demographics characteristics 190 
and the household interest profile 180 have a session value 
and an average value, as will be discussed herein. 

The monitoring system depicted in FIG. 2 is responsible 
for monitoring the subscriber activities, and can be used to 
realize the SCS 100. In a preferred embodiment, the 
monitoring system of FIG. 2 is located in a television set- 
top device or in the television itself. In an alternate 
embodiment, the monitoring system is part of a computer 
which receives programming from a network. 

In an application of the system for television 
services, an input connector 220 accepts the video signal 
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coming either from an antenna, cable television input, or 
other network. The video signal can be analog or Digital 
MPEG. Alternatively, the video source may be a video stream 
or other multimedia stream from a communications network 
including the Internet . 

In the case of either analog or digital video, selected 
fields are defined to carry EPG data or closed captioning 
text. For analog video, the closed captioning text is 
embedded in the vertical blanking interval (VBI) . As 
described in US Patent 5,579,005, assigned to Scientific- 
Atlanta, Inc., the EPG information can be carried in a 
dedicated channel or embedded in the VBI. For digital 
video, the closed captioning text is carried as video user 
bits in a user_data field. The EPG data is transmitted as 
15 ancillary data and is multiplexed at the transport layer 
with the audio and video data. 

Referring to FIG. 2, a system control unit 200 receives 
commands from the user 12 0, decodes the command and forwards 
the command to the destined module. In a preferred 
embodiment, the commands are entered via a remote control to 
a remote receiver 205 or a set of selection buttons 207 
available at the front panel of the system control unit 200. 
In an alternate embodiment, the commands are entered by the 
user 120 via a keyboard. 
25 The system control unit 200 also contains a Central 

Processing Unit (CPU) 203 for processing and supervising all 
of the operations of the system control unit 200, a Read 
Only Memory (ROM) 202 containing the software and fixed 
data, a Random Access Memory (RAM) 204 for storing data. CPU 
30 203, RAM 204, ROM 202, and I/O controller 201 are attached 
to a master bus 2 06. A power supply in a form of battery 
can also be included in the system control unit 200 for 
backup in case of power outage. 

An input/output (I/O) controller 201 interfaces the 
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system control unit 200 with external devices. In a 
preferred embodiment, the I/O controller 2 01 interfaces to 
the remote receiver 205 and a selection button such as the 
channel change button on a remote control. In an alternate 
5 embodiment, it can accept input from a keyboard or a mouse. 

The program selection data 122 is forwarded to a 
channel processor 210. The channel processor 210 tunes to a 
selected channel and the media stream is decomposed into its 
basic components: the video stream, the audio stream, and 
10 the data stream. The video stream is directed to a video 
processor module 230 where it is decoded and further 
processed for display to the TV screen. The audio stream is 
directed to an audio processor 240 for decoding and output 
to the speakers. 

15 The data stream can be EPG data, closed captioning 

text, Extended Data Service (EDS) information, a combination 
of these, or an alternate type of data. In the case of EDS 
che call sign, program name and other useful data, are 
provided. In a preferred embodiment, the data stream is 
30 2 0 stored in a reserved location of the RAM 204. In an 

alternate embodiment, a magnetic disk is used for data 
storage. The system control unit 200 writes also in a 
dedicated memory, which in a preferred embodiment is the RAM 
204, the selected channel, the time 112 of selection, the 
2 5 volume level 118 and the program ID 116 and the program 
title 117. Upon receiving the program selection data 122, 
the new selected channel is directed to the channel 
processor 210 and the system control unit 200 writes to the 
dedicated memory the channel selection end time and the 
30 program title 117 at the time 112 of channel change. The 
system control unit 200 keeps track of the number of channel 
changes occurring during the viewing time via the channel 
change record 119. This data forms part of the subscriber 
selection data 110. 
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The volume control signal 124 is sent to the audio 
processor 240. In a preferred embodiment, the volume level 
118 selected by the user 120 corresponds to the listening 
volume. In an alternate embodiment, the volume level 118 
5 selected by the user 120 represents a volume level to 
another piece of equipment such as an audio system (home 
theatre system) or to the television itself. In such a 
case, the volume can be measured directly by a microphone or 
other audio sensing device which can monitor the volume at 
10 which the selected source material is being listened. 

A program change occurring while watching a selected 
channel is also logged by the system control unit 200. 
Monitoring the content of the program at the time of the 
program change can be done by reading the content of the 
15 EDS. The EDS contains information such as program title, 
which is transmitted via the VBI . A change on the program 
title field is detected by the monitoring system and logged 
as an event. In an alternate embodiment, an EPG is present 
and program information can be extracted from the EPG. In a 
30 20 preferred embodiment, the programming data received from the 

EDS or EPG permits distinguishing between entertainment 
programming and advertisements. 

FIG. 3 shows the block diagram of the channel processor 
210. In a preferred embodiment, the input connector 220 
25 connects to a tuner 300 which tunes to the selected channel. 
A local oscillator can be used to heterodyne the signal to 
the IF signal. A demodulator 302 demodulates the received 
signal and the output is fed to an FEC decoder 304. The data 
stream received from the FEC decoder 304 is, in a preferred 
30 embodiment, in an MPEG format. In a preferred embodiment, 
system demultiplexer 306 separates out video and audio 
information for subsequent decompression and processing, as 
well as ancillary data which can contain program related 
information. 
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The data stream presented to the system demultiplexer 
306 consists of packets of data including video, audio and 
ancillary data. The system demultiplexer 306 identifies each 
packet from the stream ID and directs the stream to the 
5 corresponding processor . " The video data is directed to the 
video processor module 230 and the audio data is directed to 
the audio processor 240. The ancillary data can contain 
closed captioning text, emergency messages, program guide, 
or other useful information. 
10 Closed captioning text is considered to be ancillary 

data and is thus contained in the video stream. The system 
demultiplexer 306 accesses the user data field of the video 
stream to extract the closed captioning text. The program 
guide, if present, is carried on data stream identified by a 
15 specific transport program identifier. 

In an alternate embodiment, analog video can be used. 
For analog programming, ancillary data such as closed 
captioning text or EDS data are carried in a vertical 
blanking interval. 

30 20 FIG . 4 shows the block diagram of a computer system for 

a realization of the subscriber monitoring system based on 
the reception of multimedia signals from a bi-directional 
network. A system bus 422 transports data amongst the CPU 
203, the RAM 204, Read Only Memory - Basic Input Output 
25 System (ROM-BIOS) 406 and other components. The CPU 203 
accesses a hard drive 400 through a disk controller 402. The 
standard input/output devices are connected to the system 
bus 422 through the I/O controller 201. A keyboard is 
attached to the I/O controller 201 through a keyboard port 
30 416 and the monitor is connected through a monitor port 418. 
45 Tne serial port device uses a serial port 420 to communicate 

with the I/O controller 201. Industry Standard Architecture 
(ISA) expansion slots 408 and Peripheral Component 
Interconnect (PCI) expansion slots 410 allow additional 
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cards to be placed into the computer. In a preferred 
embodiment, a network card is available to interface a local 
area, wide area, or other network. 

FIG. 5 illustrates a channel sequence and volume over a 
twenty- four (24) hour period. The Y-axis represents the 
status of the receiver in terms of on/off status and volume 
level. The X-axis represents the time of day. The channels 
viewed are represented by the windows 501-506, with a first 
channel 502 being watched followed by the viewing of a 
second channel 504 , and a third channel 506 in the morning. 
In the evening a fourth channel 501 is watched, a fifth 
channel 503, and a sixth channel 505. A channel change is 
illustrated by a momentary transition to the w off status 
and a volume change is represented by a change of level on 
15 the Y-axis. 

A detailed record of the subscriber selection data 110 
is illustrated in FIG. 6 in a table format. A time column 
602 contains the starting time of every event occurring 
during the viewing time. A Channel ID column 604 lists the 
30 20 channels viewed or visited during that period. A program 

title column 603 contains the titles of all programs viewed. 
A volume column 601 contains the volume level 118 at the 
time 112 of viewing a selected channel. 

A representative statistical . record corresponding to 
25 the household viewing habits 195 is illustrated in FIG. 7. 
In a preferred embodiment, a time of day column 700 is 
organized in period of time including morning, mid-day, 
afternoon, night, and late night. In an alternate 

embodiment, smaller time periods are used. A minutes 
watched column 702 lists, for each period of time, the time 
in minutes in which the SCS 100 recorded delivery of 
programming. The number of channel changes during that 
period and the average volume are also included in that 
table in a channel changes column 704 and an average volume 
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column 706 respectively. The last row of the statistical 
record contains the totals for the items listed in the 
minutes watched column 702, the channel changes column 704 
and the average volume 706. 

FIG. 8A illustrates an entity-relationship diagram for 
the generation of the program characteristics vector 150. 
The context vector generation and retrieval technique 
described in US Patent 5,619,709, which is incorporated 
herein by reference, can be applied for the generation of 
the program characteristics vectors 150. Other techniques 
are well known by those skilled in the art. 

Referring to FIG. 8A, the source material 130 or the 
EPG 140 are passed through a program characterization 
process 800 to generate the program characteristics vectors 
150. The program characterization process 800 is described 
in accordance with FIG. 8B. Program content descriptors 
including a first program content descriptor 802, a second 
program content descriptor 804 and an nth program content 
descriptor 806, each classified in terms of the category 
144, the sub-category 146, and other divisions as identified 
in the industry accepted program classification system, are 
presented to a context vector generator 820. As an example, 
the program content descriptor can be text representative of 
the expected content of- material found in the particular 
program category 144. In this example, the program content 
descriptors 802, 804 and 806 would contain text 
representative of what would be found in programs in the 
news, fiction, and advertising categories respectively. The 
context vector generator 820 generates context vectors for 
that set of sample texts resulting in a first summary 
context vector 808, a second summary context vector 810, and 
an nth summary context vector 812. In the example given, the 
summary context vectors 808, 810, and 812 correspond to the 
categories of news, fiction and advertising respectively. 
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The summary vectors are stored in a local data storage 
system. 

Referring to FIG. 8B, a sample of the source related 
text 136 which is associated with the new program to be 
5 classified is passed to the context vector generator 820 
which generates a program context vector 84 0 for that 
program. The source related text 13 6 can be either the 
source material 130, the EPG 140, or other text associated 
with the source material. A comparison is made between the 
10 actual program context vectors and the stored program 
content context vectors by computing, in a dot product 
computation process 830, the dot product of the first 
summary context vector 808 with the program context vector 
840 to produce a first dot product 814. Similar operations 
15 are performed to produce second dot product 816 and nth dot 
product 818. 

The values contained in the dot products 814, 816 and 
818, while not probabilistic in nature, can be expressed in 
probabilistic terms using a simple transformation in which 
20 the result represents a confidence level of assigning the 
corresponding content to that program. The transformed 
values add up to one. The dot products can be used to 
classify a program, or form a weighted sum of 
classifications which results in the program characteristics 
25 vectors 150. In the example given, if the source related 
text 136 was from an advertisement, the nth dot product 818 
would have a high value, indicating that the advertising 
category was the most appropriate category, and assigning a 
high probability value to that category. If the dot products 
30 corresponding to the other categories were significantly 
45 higher than zero, those categories would be assigned a 

value, with the result being the program characteristics 
vectors 150 as shown in FIG. 9D. 

For the sub-categories, probabilities obtained from the 
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content pertaining to the same sub-category 146 are summed 
to form the probability for the new program being in that 
sub-category 146. At the sub-category level, the same method 
is applied to compute the probability of a program being 
5 from the given category 144. The three levels of the program 
classification system; the category 144, the sub-category 
146 and the content, are used by the program 
characterization process 800 to form the program 
characteristics vectors 150 which are depicted in FIGS. 9D- 
10 9F. 

The program characteristics vectors 150 in general are 
represented in FIGS . 9A through 9F. FIGS. 9A, 9B and 9C are 
an example of deterministic program vectors. This set of 
vectors is generated when the program characteristics are 
15 well defined, as can occur when the source related text 13 6 
or the EPG 140 contains specific fields identifying the 
category 144 and the sub-category 146. A program rating can 
also provided by the EPG 140. 

In the case that these characteristics are not 
30 20 specified, a statistical set of vectors is generated from 

the process described in accordance with FIG. 8. FIG. 9D 
shows the probability that a program being watched is from 
the given category 144. The categories are listed in the X- 
axis. The sub-category 146 is also expressed in terms of 
25 probability. This is shown in FIG . 9E. The content component 
of this set of vectors is a third possible level of the 
program classification, and is illustrated in FIG. 9F. 

FIG. 10A illustrates sets of logical heuristics rules 
which form part of the heuristic rules 160. In a preferred 
30 embodiment, logical heuristic rules are obtained from 
sociological or psychological studies. Two types of rules 
are illustrated in FIG. 10A. The first type links an 
individual's viewing characteristics to demographic 
characteristics such as gender, age, and income level. A 
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channel changing rate rule 1030 attempts to determine gender 
based on channel change rate. An income related channel 
change rate rule 1010 attempts to link channel change rates 
to income brackets. A second type of rules links particular 
5 programs to particular audience, as illustrated by a gender 
determining rule 1050 which links the program category 
144 /sub-category 14 6 with a gender. The result of the 
application of the logical heuristic rules illustrated in 
75 FIG. 10A are probabilistic determinations of factors 

10 including gender, age, and income level. Although a specific 
set of logical heuristic rules has been used as an example, 
a wide number of types of logical heuristic rules can be 
used to realize the present invention. In addition, these 
rules can be changed based on learning within the system or 
15 based on external studies which provide more accurate rules. 

FIG. 10B illustrates a set of the heuristic rules 160 
expressed in terms of conditional probabilities. In the 
example shown in FIG. 10B, the category 144 has associated 
with it conditional probabilities for demographic factors 
30 20 such as age, income, family size and gender composition. 

The category 144 has associated with it conditional 
probabilities that represent probability that the viewing 
group is within a certain age group dependent on the 
probability that they are viewing a program in that category 
25 144. 

FIG. 11 illustrates an entity- relationship diagram for 
the generation of the program demographic vectors 170. In a 
preferred embodiment, the heuristic rules 160 are applied 
along with the program characteristic vectors 150 in a 
30 program target analysis process 1100 to form the program 
demographic vectors 170. The program characteristic vectors 
150 indicate a particular aspect of a program, such as its 
violence level. The heuristic rules 160 indicate that a 
particular demographic group has a preference for that 
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program. As an example, it may be Che case that young males 
have a higher preference for violent programs than other 
sectors of the population. Thus, a program which has the 
program characteristic vectors 150 indicating a high 
probability of having violent content, when combined with 
the heuristic rules 160 indicating that "young males like 
violent programs," will result, through the program target 
analysis process 1100, in the program demographic vectors 
170 which indicate that there is a high probability that the 
program is being watched by a young male. 

The program target analysis process 1100 can be 
realized using software programmed in a variety of languages 
which processes mathematically the heuristic rules 160 to 
derive the program demographic vectors 170. The table 
representation of the heuristic rules 160 illustrated in 
FIG. 10B expresses the probability that the individual or 
household is from a specific demographic group based on a 
program with a particular category 144. This can be 
expressed, using probability terms as follow "the 
probability that the individuals are in a given demographic 
group conditional to the program being in a given category". 
Referring to FIG . 9D, the probability that the group has 
certain demographic characteristics based on the program 
being in a specific category is illustrated. 

Expressing the probability that a program is destined 
to a specific demographic group can be determined by 
applying Bayes rule. This probability is the sum of the 
conditional probabilities that the demographic group likes 
the program, conditional to the category 144 weighted by the 
probability that the program is from that category 144. In a 
preferred embodiment, the program target analysis can 
calculate the program demographic vectors by application of 
logical heuristic rules, as illustrated in FIG. 10A, and by 
application of heuristic rules expressed as conditional 
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probabilities as shown in FIG. 10B. Logical heuristic rules 
can be applied using logical programming and fuzzy logic 
using techniques well understood by those skilled in the 
art, and are discussed in the text by S. V. Kartalopoulos 
5 entitled "Understanding Neural Networks and Fuzzy Logic" 
which is incorporated herein by reference. 

Conditional probabilities can be applied by simple 
mathematical operations multiplying program context vectors 
by matrices of conditional probabilities. By performing 
10 this process over all the demographic groups, the program 
target analysis process 1100 can measure how likely a 
program is to be of interest to each demographic group. 
Those probabilities values form the program demographic 
vector 170 represented in FIG. 12. 
15 As an example, the heuristic rules expressed as 

conditional probabilities shown in FIG . 10B are used as part 
of a matrix multiplication in which the program 
characteristics vector 150 of dimension N, such as those 
shown in FIGS. 9A-9F is multiplied by an N x M matrix of 
30 20 heuristic rules expressed as conditional probabilities, such 

as that shown in FIG . 10B. The resulting vector of 
dimension M is a weighted average of the conditional 
probabilities for each category and represents the household 
demographic characteristics 190. Similar processing can be 
25 performed at the sub-category and content levels. 

FIG. 12 illustrates an example of the program 
demographic vector 170, and shows the extent to which a 
particular program is destined to a particular audience. 
This is measured in terms of probability as depicted in FIG. 
30 12. The Y-axis is the probability of appealing to the 
45 demographic group identified on the X-axis. 

FIG. 13 illustrates an entity-relationship diagram for 
the generation of household session demographic data 1310 
and household session interest profile 1320. In a preferred 
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embodiment, the subscriber selection data 110 is used along 
with the program characteristics vectors 150 in a session 
characterization process 1300 to generate the household 
session interest profile 1320. The subscriber selection data 
110 indicates what the subscriber is watching, for how long 
and at what volume they are watching the program. 

In a preferred embodiment, the session characterization 
process 1300 forms a weighted average of the program 
characteristics vectors 150 in which the time duration the 
program is watched is normalized to the session time 
(typically defined as the time from which the unit was 
turned on to the present) . The program characteristics 
vectors 150 are multiplied by the normalized time duration 
(which is less than one unless only one program has been 
viewed) and summed with the previous value. Time duration 
data, along with other subscriber viewing information, is 
available from the subscriber selection data 110. The 
resulting weighted average of program characteristics 
vectors forms the household session interest profile 1320, 
with each program contributing to the household session 
interest profile 1320 according to how long it was watched. 
The household session interest profile 1320 is normalized to 
produce probabilistic values of the household programming 
interests during that session. 

In an alternate embodiment, the heuristic rules 160 are 
applied to both the subscriber selection data 110 and the 
program characteristics vectors 150 to generate the 
household session demographic data 1310 and the household 
session interest profile 1320. In this embodiment, weighted 
averages of the program characteristics vectors 150 are 
formed based on the subscriber selection data 110, and the 
heuristic rules 160 are applied. In the case of logical 
heuristic rules as shown in FIG. 10A, logical programming 
can be applied to make determinations regarding the 
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household session demographic data 1310 and the household 
session interest profile 1320. In the case of heuristic 
rules in the form of conditional probabilities such as those 
illustrated in FIG. 10B, a dot product of the time averaged 
5 values of the program characteristics vectors can be taken 
with the appropriate matrix of heuristic rules to generate 
both the household session demographic data 1310 and the 
household session interest profile 1320. 

Volume control measurements which form' part of the 
10 subscriber selection data 110 can also be applied in the 
session characterization process 1300 to form a household 
session interest profile 1320. This can be accomplished by 
20 using normalized volume measurements in a weighted average 

manner similar to how time duration is used. Thus, muting a 
15 show results in a zero value for volume, and the program 
characteristics vector 150 for this show will not be 

25 

averaged into the household session interest profile 1320. 

FIG. 14 illustrates an entity- relationship diagram for 
the generation of average household demographic 
30 20 characteristics and session household demographic 

characteristics 190. A household demographic 

characterization process 1400 generates the household 
demographic characteristics 190 represented in table format 
in FIG. 15. The household demographic characterization 
25 process 1400 uses the household viewing habits 195 in 
combination with the heuristic rules 160 to determine 
demographic data. For example, a household with a number of 
minutes watched of zero during the day may indicate a 
household with two working adults. Both logical heuristic 
30 rules as well as rules based on conditional probabilities 
45 can be applied to the household viewing habits 195 to obtain 

the household demographics characteristics 190. 

The household viewing habits 195 is also used by the 
system to detect out -of -habits events. For example, if a 
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household with a zero value for the minutes watched column 
702 at late night presents a session value at that time via 
the household session demographic data 1310, this session 
will be characterized as an out-of -habits event and the 
5 system can exclude such data from the average if it is 
highly probable that the demographics for that session are 
greatly different than the average demographics for the 
household. Nevertheless, the results of the application of 
the household demographic characterization process 1400 to 
10 the household session demographic data. 1310 can result in 
valuable session demographic data, even if such data is not 
added to the average demographic characterization of the 
20 household. 

FIG. 15 illustrates the average and session household 
15 demographic characteristics. A household demographic 
parameters column 1501 is followed by an average value 
column 1505, a session value column 1503, and an update 
column 1507. The average value column 1505 and the session 
value column 1503 are derived from the household demographic 
30 20 characterization process 1400. The deterministic parameters 

such as address and telephone numbers . can be obtained from 
an outside source or can be loaded into the system by the 
subscriber or a network operator at the time of 
installation. Updating of deterministic values is prevented 
25 by indicating that these values should not be updated in the 
update column 1507. 

FIG- 16 illustrates an entity-relationship diagram for 
the generation of the household interest profile 180 in a 
household interest profile generation process 1600. In a 
30 preferred embodiment, the household interest profile 
generation process comprises averaging the household session 
interest profile 1320 over multiple sessions and applying 
the household viewing habits 195 in combination with the 
heuristic rules 160 to form the household interest profile 
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180 which takes into account both the viewing preferences of 
the household as well as assumptions about 
households/subscribers with those viewing habits and program 
preferences. 

5 FIG. 17 illustrates the household interest profile 180 

which is composed of a programming types row 1709, a 
products types row 1707, and a household interests column 
1701, an average value column 1703, and a session value 
column 1705. 

10 The product types row .1707 gives an indication as to 

what type of advertisement the household would be interested 
in watching, thus indicating what types of products could 
potentially be advertised with a high probability of the 
advertisement being watched in its entirety. The 
15 programming types row 1709 suggests what kind of programming 
the household is likely to be interested in watching. The 
household interests column 1701 specifies the types of 
programming and products which are statistically 
characterized for that household. 
30 20 As an example of the industrial applicability of the 

invention, a household will perform its normal viewing 
routine without being requested to answer specific questions 
regarding likes and dislikes. Children may watch television 
in the morning in the household, and may change channels 
25 during commercials, or not at all. The television may 
remain off during the working day, while the children are at 
school and day care, and be turned on again in the evening, 
at which time the parents may "surf" channels, mute the 
television during commercials, and ultimately watch one or 
30 two hours of broadcast programming. The present invention 
provides the ability to characterize the household, and may 
make the determination that there are children and adults in 
the household, with program and product interests indicated 
in the household interest profile 180 corresponding to a 
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family of that composition. A household with two retired 
adults will have a completely different characterization 
which will be indicated in the household interest profile 
180. 

Although the present invention has been largely 
described in the context ■ of a single computing platform 
receiving programming, the SCS 100 can be realized as part 
of a client-server architecture, as illustrated in FIG. 18. 
Referring to FIG . 18, residence 1800 contains a personal 
computer (PC) 1820 as. well as the combination of a 
television 1810 and a set-top 1808, which can request and 
receive programming. The equipment in residence 1800, or 
similar equipment in a small or large business environment, 
forms the client side of the network as defined herein. 
Programming is delivered over an access network 1830, which 
may be a cable television network, telephone type network, 
or other access network. Information requests are made by 
the client side to a server 184 0 which forms the server side 
of the network. Server 1840 has content locally which it 
provides to the subscriber, or requests content on behalf of 
the subscriber from a third party content provider I860, as 
illustrated in FIG. 18. Requests made on behalf of the 
client side by server 1840 are made across a wide area 
network 185 0 which can be the Internet or other public or 
private network. Techniques for making requests on behalf of 
a client are frequently referred to a proxy techniques and 
are well known to those skilled in the art. The server side 
receives the requested programming which is displayed on PC 
1820 or television 1810 according to which device made the 
request . 

In a preferred embodiment the server 1840 maintains the 
subscriber selection data 110 which it is able to compile 
based on its operation as a proxy for the client side. 
Retrieval of source related information and the program 
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target analysis process 1100, the program characterization 
process 800, the program target analysis process 1100, the 
session characterization process 1300, the household 
demographic characterization process 14 00, and the household 
5 interest profile generation process 1600 can be performed by 
server 1840. 

Referring to PIG. 19 an advertisement monitoring table 
is illustrated, in which an advertisement ID (AD ID) column 
1915 contains a numerical ID for an advertisement which was 
10 transmitted with the advertisement in the form of a Program 
ID, http address, or other identifier which is uniquely 
associated with the advertisement. A product column 1921 
contains a product description which indicates the type of 
product that was advertised. A brand column 1927 indicates 
15 the brand name of the product or can alternatively list 

generic name for that product. A percent watched column 1933 
indicates the percentage of the advertisement the subscriber 
viewed. In an alternate embodiment, a letter rating or 
other type of rating is used to indicate the probability 
30 20 that the advertisement was watched. A volume column 1937 

indicates the volume level at which the advertisement was 
watched. 

As an example of the industrial applicability of the 
invention, a manufacturer may develop an advertising 
25 strategy which includes the insertion of advertisements 
during popular evening programs. The costs for such ad 
insertions can be extremely high. In order to insure the 
cost effectiveness of this advertising strategy, the 
manufacturer has the advertisements placed during less 
30 watched but similar programs and monitors how subscribers 
react, and can determine approximately how many times the 
advertisement has been watched out of all of the possible 
viewings. This data can be used to confirm the potential 
effectiveness of the advertisement and to subsequently 

37 



55 



WO01V33160 PCT/US99/28528 

determine if purchasing the more expensive time during 
evening programming will be cost-effective, or if the 
advertisement should be modified or placed in other 
programming . 

Continuing this example, the manufacturer may place an 
advertisement for viewing during "prime time" for an initial 
period but can subsequently cancel broadcasts of the 
advertisement if it is found that the majority of 
subscribers never see the advertisement. 

Although this invention has been illustrated by 
reference to specific embodiments, it will be apparent to 
those skilled in the art that various changes and 
modifications may be made which clearly fall within the 
scope of the invention. The invention is intended to be 
protected broadly within the spirit and scope of the 
appended claims. 
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Claims 

What is claimed is: 

1. A data processing system for monitoring advertisements 
watched by a subscriber, said data processing system 
comprising : 

(a) computer processor means for processing data; 

(b) storage means for ' storing data on a storage 
medium; 

(c) first means for monitoring subscriber 
activity wherein said first means includes 
recording means for storing subscriber 
selections ; 

(d) second means for retrieving advertisement 
related information wherein said advertisement 
related information contains descriptive fields 
corresponding to said advertisement; 

(e) third means for processing information 
wherein said third means includes means for 
determining the extent to which an advertisement 
is viewed by said subscriber; and 

(f) fourth means for storing said descriptive 
fields and said determination of the extent to 
which said advertisement is viewed by said 
subscriber. 

2. The system described in claim 1 wherein said first means 
for monitoring subscriber activity further comprises means 
for monitoring volume levels wherein said volume levels 
correspond to subscriber selection volume levels. 

3. The system described in claim 1 further comprising: 

(g) fifth means for determining a subscriber 
product interests profile; and 
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(h) sixth means for storing said subscriber 
products interests profile. 

4 . The system described in claim 1 wherein said second 
means for retrieving advertisement related information 
further comprises a means for context mining of textual 
information associated with said selected source material . 

5 . The system described in claim 4 wherein said textual 
information is text derived from closed-captioning data 
associated with said advertisement. 

6 . The system described in claim 5 wherein said text 
derived from closed-captioning data associated with said 
advertisement includes a product name field. 

7 . The system described in claim 4 wherein said text 
derived from closed-captioning data associated with said 
advertisement includes a product brand field. 

8. A client-server based data processing system for 
monitoring advertisements watched by a subscriber, said 
client-server based data processing system comprising: 

(a) first computer processor means at a client 
side for receiving and displaying advertisements 
wherein said first computer means is capable of 
transmitting channel change requests; 

(b) second computer processor means at a server 
side for receiving said channel change requests 
and for processing data; 

(c) second storage means associated with second 
computer processor means for storing data on a 
storage medium; 
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(d) first means at said server side for 
monitoring subscriber activity wherein said 
first means for monitoring subscriber activity 
includes receiving means for receiving 
subscriber channel change requests, recording 
means for storing subscriber channel change 
requests; 

(e) second means at said server side for 
retrieving advertisement related information 
wherein said advertisement related information 
contains descriptive fields corresponding to an 
advertisement ; 

(f) third means at said server side for 
processing information wherein said third means 
includes means for determining the extent to 
which an advertisement is viewed by said 
subscriber; and 

(g) fourth means at said server side for storing 
said descriptive fields and said determination 
of the extent to which said advertisement is 
viewed by said subscriber. 

9. The system described in claim 8 further comprising*. 

(h) fifth means for determining a subscriber 
product interests profile; and 

(i) sixth means for storing said subscriber 
product interests profile. 

10. The system described in claim 8 wherein said second 
means for retrieving advertisement related information 
further comprises a means for context mining of textual 
information associated with said selected source material. 
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11 . The system described in claim 10 wherein said textual 
information is text derived from closed- captioning data 
associated with said advertisement. 



12 . The system described in claim 11 wherein said text 

10 

derived from closed-captioning data associated with said 
advertisement includes a product name field. 

15 13 . The system described in claim 11 wherein said text 

derived from closed-captioning data associated with said 
advertisement includes a product brand field. 
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14. A data processing system for generating a subscriber 
profile vector, said data processing system comprising: 

(a) computer processor means for processing data; 

(b) storage means for storing data on a storage 
medium; 

(c) first means for monitoring subscriber 
activity wherein said first means includes 
recording means for storing subscriber selection 
data wherein said subscriber selection data 
corresponds to selected source material; 

(d) second means for retrieving source related 
information wherein said source related 
information contains descriptive fields 
corresponding to said selected source material; 

(e) third means for processing information 
wherein said third means includes means for 
processing said subscriber selection data with 
respect to said descriptive fields to form said 
subscriber profile vector; and 

(f) fourth means for storing said subscriber 
profile vector. 
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15. The system described in claim 14 wherein said first 
means for monitoring subscriber activity further comprises 
means for monitoring time durations wherein said time 
durations correspond to viewing times of said selected 
source material . 

16. The system described in claim 14 wherein said first 
means for monitoring subscriber activity further comprises 
means for monitoring volume levels wherein said volume 
levels correspond to subscriber selection volume levels. 



17. The system described in claim 14 wherein said 
20 subscriber profile vector contains household demographic 

data indicating probabilistic measurements of household 
demographics . 

25 

18. The system described in claim 14 wherein said 
subscriber profile vector contains household program 
preference information indicating probabilistic measurements 

30 of household program interests. 

19. The system described in claim 14 wherein said 
subscriber profile vector contains household product 
preference information indicating probabilistic measurements 
of household product interests. 



20. The system described in claim 14 wherein said second 
means for retrieving source related information further 
comprises a means for context mining of textual information 
associated with said selected source material. 

21. The system described in claim 20 wherein said textual 
information is text derived from closed-captioning data 
associated with said selected source material. 
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22 . The system described in claim 14 wherein said second 
means for retrieving source related information further 
comprises a means for retrieving information associated with 
said selected source material from an electronic program 
guide . 

23 . The system described in claim 14 wherein said third 
means for processing information processes information over 
a viewing session and wherein said subscriber profile vector 
corresponds to said viewing session. 

24 . The system described in claim 14 wherein said third 
means for processing information processes information over 
multiple viewing sessions and wherein said subscriber 
profile vector corresponds to an average value over said 
multiple viewing sessions. 



25. A data processing system for generating a subscriber 
profile vector, said data processing system comprising: 

(a) computer processor means for processing data; 

(b) storage means for storing data on a storage 
medium ; 

35 (c) first means • for monitoring subscriber 

activity wherein said first means includes 
recording means for storing subscriber selection 
data wherein said subscriber selection data 
corresponds to selected source material; 
(d) second means for retrieving source related 
information wherein said source related 
45 information contains descriptive fields 

corresponding to said selected source material; 
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(e) third means for generating a program 
characteristics vector based on said source 
related information; 

(f) fourth means for storing a set of heuristic 
rules; 

(g) fifth means for processing information 
wherein said fifth means includes means for 
processing said subscriber selection data with 
respect to said program characteristics vector 
and said set of heuristic rules to form said 
subscriber profile vector; and 

(h) sixth means for storing said subscriber 
profile vector. 

26. The system described in claim 25 wherein said first 
means for monitoring subscriber activity further comprises 
means for monitoring time durations wherein said time 
durations correspond to viewing times of said selected 
source material . 

27. The system described in claim 25 wherein said first 
means for monitoring subscriber activity further comprises 
means for monitoring volume levels wherein said volume 
levels correspond to subscriber selection volume levels. 

28. The system described in claim 25 wherein said 
subscriber profile vector contains household demographic 
data indicating probabilistic measurements of household 
demographics . 

29. The system described in claim 25 wherein said 
subscriber profile vector contains a household session 
interest profile indicating probabilistic measurements of 
household interests. 
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30. A data processing system for generating a household 
demographic characteristics vector, said data processing 
system comprising: 

(a) computer processor means for processing data; 

(b) storage means for storing data on a storage 
medium; 

(c) first means for monitoring subscriber 
activity wherein said first means includes 

.recording means for storing subscriber selection 
. data wherein said subscriber selection data 
corresponds to selected source material ; 

(d) second means for generating household viewing 
habits information wherein said household 
viewing habits information is generated from 
said subscriber selection data; 

(e) third means for storing a set of heuristic 
rules? 

(f) fourth means for processing information 
wherein said fourth means includes means for 
processing said subscriber selection data with 
respect to said set of heuristic rules to form 
said household demographic characteristics 
vector; and 

(g) fifth means for storing said household 
demographic characteristics vector. 

31. The system described in claim 30 wherein said fourth 
means for processing information processes information over 
a viewing session and wherein said household demographic 
characteristics vector corresponds to said viewing session. 

32 . The system described in claim 30 wherein said fourth 
means for processing information processes information over 
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a period of multiple viewing sessions wherein said household 
demographic characteristics vector corresponds to an average 
value over said multiple viewing sessions. 

33. A data processing system for generating a subscriber 
profile vector in a client-server based architecture, said 
data processing system comprising: 

(a) first computer processor means at a client 
side for requesting and displaying source 
information wherein said first computer means 
transmits a request for source material and 
receives and displays said source material; 

(b) second computer processor means at a server 
side for processing data; 

(c) second storage means associated with second 
computer processor means for storing data on a 
storage medium; 

(d) first means at said server side for 
monitoring subscriber activity wherein said 
first means for monitoring subscriber activity 
includes receiving means for receiving 
subscriber requests for said source material, 
recording means for storing subscriber selection 
data wherein said subscriber selection data 
corresponds to a record of requests for said 
source material ,- 

(e) second means at said server side for 
retrieving source related information wherein 
said source related information contains 
descriptive fields corresponding to said source 

45 material 

(f) third means at said server side for 
processing information wherein said third means 
includes means for processing said subscriber 
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selection data with respect to said descriptive 
fields to form said subscriber profile vector,- 
and 

(g) fourth means at said server side for storing 
said subscriber profile vector. 

34. The system described in claim 33 wherein said first 
means for monitoring subscriber activity further comprises 
means for monitoring time durations wherein said time 
durations correspond to viewing times of said selected 
source material . 

35. The system described in claim 33 wherein said first 
means for monitoring subscriber activity further comprises 
means for monitoring volume levels wherein said volume 
levels correspond to subscriber selection volume levels. 

36. The system described in claim 33 wherein said 
subscriber profile vector contains household demographic 
data indicating probabilistic measurements of household 
demographics . 

37. The system described in claim 33 wherein said 
subscriber profile vector contains household program 
preference information indicating probabilistic measurements 
of household program interests. 

38. The system described in claim 33 wherein said 
subscriber profile vector contains household product 
preference information indicating probabilistic measurements 
of household product interests. 



39. The system described in claim 33 wherein said second 
means for retrieving source related information further 
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comprises a means for context mining of textual information 
associated with said selected source material. 



40. The system described in claim 39 wherein said textual 
information is text derived from closed- captioning data 
associated with said selected source material. 

41. The system described in claim 33 wherein said second 
means for retrieving source related information further 
comprises a means for retrieving information associated with 
said selected source material from an electronic program 
guide . 

42. The system described in claim 33 wherein said third 
means for processing information processes information over 
a viewing session and wherein said subscriber profile vector 
corresponds to said viewing session. 

43. The system described in claim 33 wherein said third 
means for processing information processes information over 
multiple viewing sessions and wherein said subscriber 
profile vector corresponds to an average value over said 
multiple viewing sessions. 

44. A data processing system for generating a subscriber 
profile vector in a client- server based architecture, said 
data processing system comprising: 

(a) first computer processor means at a client 
side for requesting and displaying source 
information wherein said first computer means 
transmits a request for source material and 
receives and displays said source material; 

(b) second computer processor means at a server 
side for processing data; 
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(c) second storage means associated with second 
computer processor means for storing data on a 
storage medium; 

(d) first means at said server side for 
monitoring subscriber activity wherein said 
first means for monitoring subscriber activity- 
includes receiving means for receiving 
subscriber requests for said source material, 
recording means for storing subscriber selection 
data wherein said subscriber selection data 
corresponds to a record of requests for said 
source material ; 

<e) second means at said server side for 
retrieving source related information wherein 
said source related information contains 
descriptive fields corresponding to said source 
material ; 

(f) third means at said server side for 
generating a program characteristics vector 
based on said source related information; 

(g) fourth means at said server side for storing 
a set of heuristic rules; 

Ch) fifth means at said server side for 
processing information wherein said fifth means 
includes means for processing said subscriber 
selection data with respect to said program 
characteristics vector and said set of heuristic 
rules to form said subscriber profile vector; 
and 

(i) sixth means at said server side for storing 
said subscriber profile vector. 



45. The system described in claim 44 wherein said first 
means for monitoring subscriber activity further comprises 
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means for monitoring time durations wherein said time 
durations correspond to viewing times of said selected 
source material. 

46. The system described in claim 44 wherein said first 
means for monitoring subscriber activity further comprises 
means for monitoring volume levels wherein said volume 
levels correspond to subscriber selection volume levels. 

47. The system described in claim 44 wherein said 
subscriber profile vector contains household demographic 
data indicating probabilistic measurements of household 
demographics . 

48. The system described in claim 44 wherein said 
subscriber profile vector contains a household session 
interest profile indicating probabilistic measurements of 
household interests. 

49. A data processing system for generating a household 
demographic characteristics vector in a client-server based 
architecture, said data processing system comprising: 

(a) first computer processor means at a client 
side for requesting and displaying source 
information wherein said first computer means 
transmits a request for source material and 
receives' and displays said source material; 

(b) second computer processor means at a server 
side for processing data; 

(c) first means at said server side for 
monitoring subscriber activity wherein said 
first means includes recording means for storing 
subscriber selection data wherein said 
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subscriber selection data corresponds to 
selected source material; 

(d) second means at said client side for 
generating household viewing habits information 
wherein said household viewing habits 
information is generated from said subscriber 
selection data; 

(e) third means at said server side for storing a 
set of heuristic rules; 

(f) fourth means at said server side for 
processing information wherein said fourth means 
includes means for processing said subscriber 
selection data with respect to said set of 
heuristic rules to form said household 
demographic characteristics vector; and 

(g) fifth means at said server side for storing 
said household demographic characteristics 
vector. 

50. The system described in claim 49 wherein said fourth 
means for processing information processes information over 
a viewing session and wherein said household demographic 
characteristics vector corresponds to said viewing session. 

51. The system described in claim 49 wherein said fourth 
means for processing information processes information over 
a period of multiple viewing sessions wherein said household 
demographic characteristics vector corresponds to an average 
value over said multiple viewing sessions. 
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